An empirical study of datapath, memory hierarchy, and network in SIMD array architectures
نویسندگان
چکیده
Although SIMD arrays have been built for 30 years, they have as a class been the subject of few empirical design studies. Using ENPASSANT, a simulation environment developed for that purpose, we analyze several aspects of SIMD array architecture with respect to a test suite of spatially mapped applications. Several surprising results are obtained. With respect to memory hierarchy, we nd that adding a level of cache to current PE designs is likely to be advantageous , but that such a cache will look quite diierent than expected. In particular, we nd that associativity has unusual signiicance and that performance varies inversely with block size. Router network results indicate the importance of support for local transfers, broadcast, and reduction even at the expense of arbitrary permutations. Other communication results point to the appropriate dimensionality of k-ary n-cube networks (2 or 3), and the criticality of supporting bidi-rectional transfers, even if the overall bandwidth remains unchanged.
منابع مشابه
Using Emulations to Enhance the Performance of Parallel Architectures
ÐWe illustrate the potential of techniques and results from the theory of network emulations to enhance the performance of a parallel architecture. The vehicle for this demonstration is a suite of algorithms that endow an N-processor bit-serial processor array A with a ameta-instructiono GAUGE k, which (logically) reconfigures A into an N=k-processor virtual machine Bk that has: 1) a datapath a...
متن کاملFlexible Parallel Processing in Memory: Architecture + Programming Model
VLSI technology continues to develop at a staggering rate presenting two challenges to computer designers: (i) how to capitalize on the additional resources that are available on a chip; and (ii) how to evolve computer architecture models that are well matched to the signi cantly changed physical parameters of new technology and the expanding needs of applications. One of the chief challenges i...
متن کاملSize Tradeo s in theDesign of SIMD Arrays for aSpatially Mapped Workload
Though massively parallel SIMD arrays continue to be promising for many computer vision applications, they have undergone few systematic empirical studies. The problems include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. The latter two issues have been addressed pr...
متن کاملA Sub-mW H.264 Baseline-Profile Motion Estimation Processor Core with a VLSI-Oriented Block Partitioning Strategy and SIMD/Systolic-Array Architecture
We propose a sub-mW H.264 baseline-profile motion estimation processor for portable video applications. It features a VLSIoriented block partitioning strategy and low-power SIMD/systolic-array datapath architecture, where the datapath can be switched between an SIMD and systolic array depending on processing flow. The processor supports all the seven kinds of block modes, and can handle three r...
متن کاملPentium III Processor Implementation Tradeoffs
This paper discusses the implementation tradeoffs of the Pentium III processor. The Pentium III processor implements a new extension of the IA-32 instruction set called the Internet Streaming Single-Instruction, MultipleData (SIMD) Extensions (Internet SSE). The processor is based on the Pentium Pro processor microarchitecture. The initial development goals for the Pentium III processor were ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995